Building a Chinese Shallow Parsed TreeBank for Collocation Extraction
نویسندگان
چکیده
To automatically extract Chinese collocations and build a large-scale collocation bank, we are developing a one-million-word Chinese shallow parsed treebank. The treebank can be used not only as a training set for our shallow parser, but also as processed data from which collocations are extracted. This paper presents several issues related to this on-going project, such as our definition of shallow parsing used in Chinese collocation extraction, guideline preparation, and quality control.
منابع مشابه
TCtract-A Collocation Extraction Approach for Noun Phrases Using Shallow Parsing Rules and Statistic Models
This paper presents a hybrid method for extracting Chinese noun phrase collocations that combines a statistical model with rule-based linguistic knowledge. The algorithm first extracts all the noun phrase collocations from a shallow parsed corpus by using syntactic knowledge in the form of phrase rules. It then removes pseudo collocations by using a set of statistic-based association measures (...
متن کاملUsing Collocation Statistics in Information Extraction
Our main objective in participating MUC-7 is to investigate and experiment with the use of collocation statistics in information extraction. A collocation is a habitual word combination, such as \weather a storm", \ le a lawsuit", and \the falling yen". Collocation statistics refers to the frequency counts of the collocational relations extracted from a parsed corpus. For example, out of 6577 i...
متن کاملAn Algorithm Combining Statistics-based and Rules-based for Chunk Identification of Chinese Sentences
Natural language processing (NLP) is a very hot research domain. One important branch of it is sentence analysis, including Chinese sentence analysis. However, currently, no mature deep analysis theories and techniques are available. An alternative way is to perform shallow parsing on sentences which is very popular in the domain. The chunk identification is a fundamental task for shallow parsi...
متن کاملApplying Maximum Entropy to Robust Chinese Shallow Parsing
Recently, shallow parsing has been applied to various information processing systems, such as information retrieval, information extraction, question answering, and automatic document summarization. A shallow parser is suitable for online applications, because it is much more efficient and less demanding than a full parser. In this research, we formulate shallow parsing as a sequential tagging ...
متن کاملRule-Based Extraction of English Verb Collocates from a Dependency-Parsed Corpus
We report on a rule-based procedure of extracting and labeling English verb collocates from a dependency-parsed corpus. Instead of relying on the syntactic labels provided by the parser, we use a simple topological sequence that we fill with the extracted collocates in a prescribed order. A more accurate syntactic labeling will be obtained from the topological fields by comparison of correspond...
متن کامل